Apprentissage automatique d'un chunker pour le français (Machine Learning of a chunker for French) [in French]
نویسندگان
چکیده
Machine Learning of a chunker for French We describe in this paper how to automatically learn a chunker for French, from the French Tree Bank and CRFs (Conditional Random Fields). We did several experiments, either to recognize every possible kind of chunks, or to focus on simple nominal phrases only. We evaluate the obtained chunker on internal data (i.e. also extracted from the French Tree Bank) as well as on external (i.e from a distinct corpus) ones, to measure its robustness. MOTS-CLÉS : chunking, apprentissage automatique, French Tree Bank, CRF.
منابع مشابه
Un segmenteur-étiqueteur et un chunker pour le français (A Segmenter-POS Labeller and a Chunker for French) [in French]
A Segmenter-POS Labeller and a Chunker for French We propose a demo of two softwares : a Segmenter-POS Labeller for French and a Chunker for texts treated by the first program. Both have been learned from the French Tree Bank. MOTS-CLÉS : étiquetage POS, chunking, apprentissage automatique, French Tree Bank, CRF.
متن کاملBuilding and exploiting a French corpus for sentiment analysis (Construction et exploitation d'un corpus français pour l'analyse de sentiment) [in French]
Building and exploiting a French corpus for sentiment analysis This work introduces a French corpus for sentiment analysis. We describe the construction and organization of the corpus. We then apply machine learning techniques to automatically predict whether a text is positive or negative (the opinion classification task). Two techniques are used : logistic regression and classification based ...
متن کاملCan we chunk well with bad POS labels? (Peut-on bien chunker avec de mauvaises étiquettes POS ?) [in French]
In this paper, we test two distinct approaches to chunk transcribed oral data, trying to minimize the phases of manual correction. First, we use an existing chunker, learned from written texts, then we try to learn a new specific chunker from a small amount of manually corrected labeled oral data. The purpose is to reach the best possible results for the chunker with as few manual corrections o...
متن کاملPre-processing and Language Analysis for Arabic to French Statistical Machine Translation (Traduction automatique statistique pour l'arabe-français améliorée par le prétraitement et l'analyse de la langue) [in French]
متن کامل
A Named Entity recognizer for French (Un reconnaisseur d'entités nommées du Français) [in French]
We propose to demonstrate a french named entity recognizer trained on the French TreeBank enriched with named entity annotations. Mots-clés : REN, POS, apprentissage automatique, French Treebank, extraction d’information, CRF.
متن کامل